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Abstract 

Dimension reduction is often needed in the area of data mining. The goal 
of these methods is to map the given high-dimensional data into a low¬ 
dimensional space preserving certain properties of the initial data. There 
are two kinds of techniques for this purpose. The first, projective meth¬ 
ods, builds an explicit linear projection from the high-dimensional space to 
the low-dimensional one. On the other hand, the nonlinear methods utilizes 
nonlinear and implicit mapping between the two spaces. In both cases, the 
methods considered in literature have usually relied on computationally very 
intensive matrix factorizations, frequently the Singular Value Decomposition 
(SVD). The computational burden of SVD quickly renders these dimension 
reduction methods infeasible thanks to the ever-increasing sizes of the prac¬ 
tical datasets. 

In this paper, we present a new decomposition strategy. Reduced Ba¬ 
sis Decomposition (RBD), which is inspired by the Reduced Basis Method 
(RBM). Given X the high-dimensional data, the method approximates it by 
Y X) with Y being the low-dimensional surrogate and T the transfor¬ 
mation matrix. Y is obtained through a greedy algorithm thus extremely 
efficient. In fact, it is significantly faster than SVD with comparable accu¬ 
racy. T can be computed on the fly. Moreover, unlike many compression 
algorithms, it easily hnds the mapping for an arbitrary “out-of-sample” vec¬ 
tor and it comes with an “error indicator” certifying the accuracy of the 
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compression. Numerical results are shown validating these claims. 
Keywords: 
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1. Introduction 

Dimension reduction is ubiquitous in many areas ranging from pattern 
recognition, clustering, classihcation, to fast numerical simulation of com¬ 
plicated physical phenomena. The fundamental question to address is how 
to approximate a n-dimensional space by a d-dimensional one with d n. 
Specihcally, we are given a set of high-dimensional data 



( 1 ) 


and the goal is to hnd its low-dimensional approximation 




( 2 ) 


with reasonable accuracy. 

There are two types of dimension reduction methods. The hrst cate¬ 
gory consists of “projective” ones. These are the linear methods that are 
global in nature, and that explicitly transform the data matrix X into a 
low-dimensional one hy Y = TX. The leading examples are the Principal 
Component Analysis (PGA) and its variants. The methods in the second 
category act locally and are inherently nonlinear. For each sample in the 
high-dimensional space (e.g. each column of X), they directly hnd their low¬ 
dimensional approximations by preserving certain locality or affinity between 
nearby points. 

In this paper, inspired by the reduced basis method (RBM), we propose 
a linear method called “Reduced Basis Decomposition (RBD)”. It is much 
faster than PCA/SVD-based techniques. Moreover, its low-dimensional vec¬ 
tors are equipped with error estimator indicating how close they are approx¬ 
imating the high-dimensional data. RBM is a relative recent approach to 
speed up the numerical simulation of parametric Partial Differential Equa¬ 
tions (PDEs) [Ml Ha 1201 El [5]. It utilizes an Offline-Online computational 
decomposition strategy to produce surrogate solution (of dimension N) in 
a time that is of orders of magnitude shorter than what is needed by the 
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underlying numerical solver of dimension A/" 3> iV (called truth solver here¬ 
after). The RBM relies on a projection onto a low dimensional space spanned 
by truth approximations at an optimally sampled set of parameter values 
p 1 cn ng [13]. This low-dimensional manifold is generated by a greedy 
algorithm making use of a rigorous a posteriori error bounds for the field vari¬ 
able and associated functional outputs of interest which also guarantees the 
fidelity of the surrogate solution in approximating the truth approximation. 

The RBD method acts in a similar fashion. Given the data matrix X as in 
([^, it iteratively builds up R (|^ whose column space approximates that of X. 
It starts with a randomly selected column of X (or a user input if existent). 
At each step where we have k vectors {|/i,..., 7/^}, the next vector yk+i 
is found by scanning the columns of X and locating the one whose error of 
projection into the current space span{j/i,..., yk] is the largest. This process 
is continued until the maximum projection/compression error is small enough 
or until the limit on the size of the reduced space is reached. An important 
feature is an offline-online decomposition that allows the computation of the 
compression error, and thus the cost of locating yk+i, to be independent of 
(the potentially large) m. 

This paper is organized as follows. In Section j^, we review the back¬ 
ground material, mainly the RBM. Section describes the reduced basis 
decomposition algorithm and discuss its properties. Numerical validations 
are presented in Section and finally some concluding remarks are offered 
in Section m 

2. Background 

The reduced basis method was developed for use with finite element meth¬ 
ods to numerically solve PDEs. We assume, for simplicity, that the prob¬ 
lems (usually parametric partial differential equations (PDE)) to simulate 
are written in the weak form: find u{y) in an Hilbert space X such that 
a{u{y),v, jj) = /(u; p), Vu G X where y is an input parameter. These simu¬ 
lations need to be performed for many values of y chosen in a given parameter 
set V. In this problem a and / are bilinear and linear forms, respectively, 
associated to the PDE (with and denoting their numerical counter¬ 
parts). We assume that there is a numerical method to solve this problem and 
the solution called the “truth approximation” or “snapshot”, is accurate 
enough for all y eV. 
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The fundamental observation utilized by RBM is that the parameter de¬ 
pendent solution is not simply an arbitrary member of the inhnite- 

dimensional space associated with the PDE. Instead, the solution mani¬ 
fold A4 = {u^(/i), p G V} can typically be well approximated by a low¬ 
dimensional vector space. The idea is then to propose an approximation of 
M by = span{M-^(pi), ..., where, ..., are 

N (-C M) pre-computed truth approximations corresponding to the param¬ 
eters {/ii,..., hn} judiciously selected according to a sampling strategy [13] , 
For a given /i, we now solve in for the reduced solution The 

online computation is A^-independent, thanks to the assumption that the 
(bi)linear forms are affin^ and the fact that they can be approximated by 
affine (bi)linear forms when they are nonaffine [21 [10]. Hence, the online 
part is very efficient. In order to be able to “optimally” hnd the N pa¬ 
rameters and to assure the hdelity of the reduced basis solution u^^\jj) to 
approximate the truth solution we need an a posteriori error estima¬ 
tor A]\f{p,) which involves the residual r{y,pi) = p) — p) 

and stability information of the bilinear form [T21 HH CHI EHl El] . With this 
estimator, we can describe briefly the classical greedy algorithm used to 
hnd the N parameters pi,... ,pAr and the space . We hrst randomly 
select one parameter value and compute the associated truth approximation. 
Next, we scan the entire (discrete) parameter space and for each parameter 
in this space compute its RB approximation and the error estimator 

Ai(/i). The next parameter value we select, /i 2 , is the one corresponding to 
the largest error estimator. We then compute the truth approximation and 
thus have a new basis set consisting of two elements. This process is repeated 
until the maximum of the error estimators is sufficiently small. 

The reduced basis method typically has exponential convergence with re¬ 
spect to the number of pre-computed solutions mm- This means that the 
number of pre-computed solutions can be small, thus the computational cost 
reduced signihcantly, for the reduced basis solution to approximate the hnite 
element solution reasonably well. The author and his collaborators showed 
[7] that it works well even for a complicated geometric electromagnetic scat¬ 
tering problem that efficiently reveals a very sensitive angle dependence (the 
object being stealthy with a particular conhguration). 


yvGX^. 
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3. Reduced basis decomposition 

In this section, we detail our proposed methodology by stating the al¬ 
gorithm, studying the error evaluation, and pinpointing the computational 
cost. 

3.1. The algorithm 

At the heart of the method stated in Algorithm [T] is a greedy algorithm 
similar to that used by RBM. It builds the reduced space dimension-by¬ 
dimension. At each step, the greedy decision for the best next dimension to 
pursue in the space corresponding to the data is made by examining an error 
indicator quantifying the discrepancy between the uncompressed data and 
the one compressed into the current (reduced) space. 

In the context of the RBM, we view each column (or row if we are com¬ 
pressing the row space) of the matrix as the fine solution of certain (virtual) 
parametric PDE with the (imaginary) parameter taking a particular value. 
Since this solution is explicitly given already by the data, the fact that the 
PDE and the parameter are absent does not matter. Once this common 
mechanism satisfied by each column (or row) is identihed, the greedy algo¬ 
rithm still relies on an accurate and efficient estimate quantifying the error 
between the original data and the compressed one. This will be the topic of 
the next subsection. 

To state the algorithm, we assume that we are given a data matrix X G 
l^mxn, largest dimension dmax < that the practitioner wants to retain, 
and a tolerance capping the discrepancy between the original and the 
compressed data. The output is the set of bases for the compressed data 
(a low-dimensional approximation of the original data) Y G and the 

transformation matrix T G Here, d < dmax is the actual dimension of 

the compressed data. 

With this output, we can 

Compress. We represent any data entry X(:, j), the column of X G M™, 
by the column of T, T{:,j) G with usually d m. 

Uncompress. An approximation of the data is reconstructed by 
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Algorithm 1 Reduced Basis Decomposition 
_ iY,T) = RBDiX,en,d^,,) _ 

1. Set d = 1, Ecur = +C) 0 , and i a random integer between 1 and n. 

2 . 

while 

2 . 1 . 


2 . 2 . 


2.3. 


2.4. 

end while 

Evaluate the compression of out-of-sample data. Given any u G 

that is not equal to any column of X, its compressed representation in 


d < dmax and Ecnr > en do 

_ V = _ 

Apply the modihed Gram-Schmidt orthonormalization to ob¬ 
tain the d^^ basis of the compressed space 
for j = 1 : d — 1 do 
v = v-{v- Q 

end for 

if ||u|| < Cr then 
Y = F(:,l : d - 1) 

T = T(:,l : d - 1) 

Break; 

else 

&= iiJ,, y(:,£()= a 

r(d,:) = av 

end if 

raa.^\\X{:,j)-Y{:,l:d)T{:,j)\\ 

and 

i = argmax||X(:, j) -Y{:,1: d)T(:,j)|| 


if Ecui- 

< Cr 

then 

Y = 

F(:,l 

:d),T = T(1 : d,:). 

else 



d = 

d+ 1. 


end if 
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is 


vc = Y'v. 


3.2. Efficient quantification of the error 

A critical part to facilitate the greedy algorithm and make the algorithm 
realistic is an efficient mechanism measnring (or estimating) the error v — vq 
nnder certain norm, ||n — nc||, in Step 2.3 of the algorithm. In this work, we 
are using the A—norm defined as follows. For a given symmetric and positive 
dehnite matrix A G the A—norm of a vector v G is dehned by 

||n||^ := Vv'Av. 

For V being any column of the data matrix X and vc its low-dimensional 
approximation vc = Yc, it is easy to see that 

||n — vc\\\ = v'Av — 2vfAv -1- v'qAvq (3) 

= v'Av - 2c'Y'Av + cY'AYc. 

The choice of A reflects the criteria of the data compression. Typical 
examples are: 

1. Identity: Equal weights are assigned to each component of the data 
entry. This makes the quality of compression uniform. In this case, the 
evaluation of ([^ is greatly simplihed and the algorithm is the fastest 
as shown below by the numerical results. 

2. General diagonal matrix: This setting can be used if part of each 
data entry needs to be preserved better and other parts can afford less 
hdelity. 

3. General SPD matrix: This most general case can be helpful if the 
goal is to preserve data across different entries anisotropiclly. 

The goal is then to evaluate the error through ([^ as efficiently as possible for 
any given c. This is achieved by employing an offline-online decomposition 
strategy where the c-independent parts are evaluated beforehand (offline) 
enabling a quick turnaround time for any given c encountered online. The 
specifics are given in the next subsection. 
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3.3. Computational cost and implementation aspects 

The Offline-Online decomposition of the computations and their complex¬ 
ities are as follows. Here, we use nnz{A) to denote the number of nonzero 
entries of a sparse matrix A. 

Offline The total cost is of order 

(m + nnz{A)) + {d^^^ + nnz{A)'j n. 

Offline MGS Every basis needs to orthogonalized against the current 
set of bases. The total cost is of order 

Offline Calculation of Errors The next basis is located by com¬ 
paring each column with its compressed version into the current 
space. To enable that, we encounter the following computational 
cost: 

Pre-computation of diag{X'AX) (for v'Av in ([^) and AX (for 
Av in (j^). The cost is of order nnz{A) n. 

Expansion of Y'AX and Y'AY. The former takes time of order 
nnz{A) d ma v m, and the latter of order nnz{A) d^ax- 

Offline Searching After these calculations, the comparison between 
the original and compressed data is then only dependent on the 
size of c (which is also the number of columns for Y). The com¬ 
plexity is of order d^^x- ^e repeated for up to n times in 

the searching process of step 2.3 of the algorithm for each of the 
up to d m ax basis elements. The total cost is at the level of n d ^^^ . 

Online Given any (possibly out-of-sample) data v G its coefficients 

in the compressed space is obtained by evaluating c = Y'v. The cost is 
of order mdmax- The decoding {Y c) can be done with the same cost. 
The online computation has complexity of order 

^ djiiax- 

We remark that, if the actual practice does not requires forming vq 
(e.g. clustering and classihcation etc) and so we only work with the 
coordinates c of n in the compressed space, then the online cost will be 
independent of m and thus much smaller. 


4. Numerical Results 

In this section, we test the rednced basis decomposition on image com¬ 
pression, and data compression. Lastly, we devise a simple face recognition 
algorithm based on RED and test it on a database of 575 images while com¬ 
paring RED with 6 other face recognition algorithms. The computation is 
done, and thus the speedup numbers reported herein should be understood 
as, in Matlab 2014a on a 2011 IMac with a 3.4 GHz Intel Core i7 processor. 

4.I. Image Compression and comparison with SVD 

We first test it on compressing two standard images Lena and Mandrill 
in Figure They both have an original resolution of 512 x 512. We take 



Figure 1: Original pictures: Lena and Mandrill. 


A = I and test the algorithm. For each component of every image, we run the 
algorithm with draax £ {170, 51,25} which implies a compression ratio of 33%, 
10%, and 5% respectively. The resulting images (formed by multiplying the 
corresponding Y and T together) are shown on the 1^* and 3'"'^ row of Figures 
1^ As a comparison, we run SVD and obtain the reconstructed matrices with 
the hrst dmax singular values accordingly. The resulting images are on the 
second and last row. Clearly, SVD provides the best quality pictures among 
all possible algorithms (and thus better than what RED provides). However, 
we see that the RED pictures are only slightly blurrier. Moreover, it takes 
much less time. In fact, we show the comparison in time between SVD and 
RED in Table We see that, when d = 51, RED is three times faster than 
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SVD and seven times faster when d = 25. Here the SVD time is the shorter 
between those taken by svd and svds commands in Matlab. 


Picture 

RED 

SVD 

Lena, d = 170 

3.57 

1 

Lena, d = 51 

0.30 

1 

Lena, d = 25 

0.14 

1 

Mandrill d = 170 

3.22 

1 

Mandrill d = 51 

0.31 

1 

Mandrill d = 25 

0.14 

1 


Table 1: Relative computational time for image compression. 


4-2. Data Compression 

Here, we test the algorithm on a few artihcially-generated data sets. 
Given a function f{x,y), the data denoted by f{T>) is constructed by evalu¬ 
ating / on a uniform tensorial grid V := 

4-2.1. Exact reconstruction 

For tensorial functions such as those listed in Table [2] with their corre¬ 
sponding d values, the RED method detects the optimal dimension, stops the 
greedy algorithm after d steps and decompose the matrix f{T>) accordingly, 
that is, as an exact product oi n x d and d x n matrices. 


4-2.2. Approximate reconstruction 

Here, we set f{x,y) = 0.6/i(x,|/) -b 0. 1 / 2 ( 0 ;, ?/) + 0.01fs{x,y) with: 


fi{x,y) 

f2{x,y) 

fsix^y) 


sin(7r(a; -b 2y)) cos(7r(2a; — y)) 
sin(107r(o; — 3y)) cos(107r(3o: -b y)) 

sin(37ra;^l/) cos(67r ^ ). 

y + 2 


and let P be a 5001 x 5001 uniform grid on [—1,1] x [—1,1]. Setting d = 22, 
the method extracts 22 columns and decompose f{D) by a product of two 
matrices of size 5001 x 22 and 22 x 5001. The compression ratio is larger 
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Figure 2: Lena and Mandrill compressed, d = 170,51,25 from left to right. The first and 
third row are from Reduced Basis Decomposition, and the second and fourth are from 
Singular Value Decomposition. 
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Function f{x,y) 

Intrinsic dimension 

sin(7ra;) cos^ny) 

1 

sin(7ra;) cos(7rt/) -|- 0.1 sin(107ra;) cos(107rt/) 

2 

sin(7ra;) cos{7iy) + 0.1 sin(107rx) cos(107rt/) 
-bO.Ol sin(1007rx) cos(1007rt/) 

3 


Table 2: Three functions with low intrinsic dimensions that can be compressed by RBD 
exactly. 


than 110. More importantly, the reconstruction plotted in Figure |^Left, has 
point-wise error below 10“®. 

We calculate the point-wise reconstruction error for reduced basis de¬ 
composition eii{d) = \\f(T>) — y(:,l : d)T{l : d, :)||. As a comparison, we 
calculate the hrst 22 singular values s* of /{V), the corresponding singular 
vectors {ui,Vi), and the reconstruction error es{d) = \\f(T>) — Y^f^iSiUiV^W. 
These two errors are plotted in Figure Right. We see that RBD matches 
SVD in terms of accuracy. We emphasize that what is striking is its effi¬ 
ciency. The RBD code, as implemented by the authoi]^ is 16 times faster 
than the svds command in Matlab. 



Figure 3: Artificial dataset: The reconstructed contour plot from compressed data (left), 
and the comparison of the history of convergence (RBD vs SVD) as d increases (right). 


^www.faculty. umassd.edu/yanlai.chen/RBD 
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Data set 

No of classes 

No of samples per class 

UMIST 

20 

19-48 


Table 3: Data set information. 


4 . 3 . Face Recognition 

Here, we demonstrate the superior efficiency and accuracy of the RED 
method on a classical classihcation task - face recognition. The goal of face 
recognition is to recognize subjects based on facial images. It has important 
applications in areas ranging from surveillance, authentication, to human- 
computer interaction etc. 



Figure 4: A snapshot of the UMIST data set. 


We use the UMIST database [9] that is publicly available on Roweis’ web 
pag^ Table summarizes its characteristics: It contains 20 people under 
different poses. The number of different views per subject varies from 19 to 
48. We use the cropped version whose snapshot is shown in Figure]^ 

As in m , we randomly choose 10 views from each class to form a training 
set. The rest of the samples (375 of them) are used as testing images. We 
show the average classihcation error rates in FigureLeft. These averages are 
computed over 100 random formations of the training and test sets. Shown 
in the middle are the results of six traditional dimension reduction techniques 
taken from m- Clearly, our method has similar performance as the PCA 


"^http://www.cs.nyu.edu/ roweis/data.html 
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Figure 5: Comparison of RED and other face recognition algorithms. Left: classification 
error for RED; Middle: classihcation error for six traditional methods mi Right: Speedup 
factor of RED over PCA. 


method, and outperforms three of the other hve methods. However, RED 
is much faster than PCA and other methods since they all involves solving 
eigenproblems m- A speedup factor as a function of the number of bases 
is plotted in Figure Right which demonstrates a speedup factor of larger 
than two for this particular test when we reach the asymptotic region (around 
when the number of basis vectors is 25). 

5. Concluding remarks 

This paper presents and tests an extremely efficient dimension reduction 
algorithm for data processing. It is multiple times faster than the SVD/PCA- 
based algorithms. What makes this possible is a greedy algorithm that it¬ 
eratively builds up the reduced space of basis vectors. Each time, the next 
dimension is located by exploring the errors of compression into the cur¬ 
rent space for all data entries. Thanks to an offline-online decomposition 
mechanism, this searching is independent of the size of each entry. Numer¬ 
ical results including one concerning a real world face recognition problem 
conhrm these findings. 
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